Skip to content

Conversation

rguan72
Copy link
Contributor

@rguan72 rguan72 commented Sep 25, 2025

  • .
  • .
  • .
  • .
  • Update backend/onyx/evals/tracing.py
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .
  • .

Description

[Provide a brief description of the changes in this PR]

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Introduces a new streaming chat turn pipeline (Simple Agent V2) and a deep-research scratchpad agent, and wires chat processing to the new runner. Adds Braintrust tracing controls and an eval CLI flag to disable log sending.

  • New Features

    • New chat turn infra: OnyxRunner, Emitter, and unified_event_stream for thread-safe, blocking event iteration.
    • fast_chat_turn agent using LiteLLM; process_message now streams via fast_chat_turn and parses agent/reasoning events.
    • Deep-research scratchpad agent with tools (web_search, web_fetch, internal_search), compaction hooks, and a clarification gate.
    • Braintrust updates: configurable masking length (BRAINTRUST_MASKING_LENGTH), tracing processor setup, demo agent, and --no-send-logs flag (propagated through evals).
  • Refactors

    • gather_stream parses unified packet objects and builds responses from streaming deltas.
    • Simplified LLM logging (no-op) and minor DR prompt/CSV cleanup.
    • Temporary change: chat session rename endpoint returns a static placeholder.
    • Eval models/providers accept no_send_logs; added quality classifier scaffold.

Copy link

vercel bot commented Sep 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 26, 2025 11:18pm

Copy link

blacksmith-sh bot commented Sep 25, 2025

7 Jobs Failed:

Run Integration Tests v2 / prepare-build failed on "Generate OpenAPI schema"
[...]
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: ***/onyx
  pythonLocation: /opt/hostedtoolcache/Python/3.11.13/arm64
  PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.11.13/arm64/lib/pkgconfig
  Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.13/arm64
  Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.13/arm64
  Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.13/arm64
  LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.11.13/arm64/lib
  PYTHONPATH: .
Traceback (most recent call last):
  File "/home/runner/_work/onyx/onyx/backend/scripts/onyx_openapi_schema.py", line 11, in <module>
    from onyx.main import app as app_fn
  File "/home/runner/_work/onyx/onyx/backend/onyx/main.py", line 55, in <module>
    from onyx.evals.tracing import setup_braintrust
  File "/home/runner/_work/onyx/onyx/backend/onyx/evals/tracing.py", line 5, in <module>
    from agents import set_trace_processors
ModuleNotFoundError: No module named 'agents'
Error: Process completed with exit code 1.
Run Integration Tests v2 / required failed on "Run actions/github-script@v7"
[...]
  retry-exempt-status-codes: 400,401,403,404,422
env:
  PRIVATE_REGISTRY: experimental-registry.blacksmith.sh:5000
  PRIVATE_REGISTRY_USERNAME: ***
  PRIVATE_REGISTRY_PASSWORD: ***
  OPENAI_API_KEY: ***
  SLACK_BOT_TOKEN: ***
  CONFLUENCE_TEST_SPACE_URL: ***
  CONFLUENCE_USER_NAME: ***
  CONFLUENCE_ACCESS_TOKEN: ***
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: ***/onyx
Error: One or more upstream jobs failed or were cancelled.

5 jobs failed running on non-Blacksmith runners.


Summary: 4 successful workflows, 5 failed workflows

Last updated: 2025-09-26 23:28:53 UTC


if message_id is None:
raise ValueError("Message ID is required")
if packet != {"type": "event"}:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 886-887

        if packet != {"type": "event"}:
            print(packet)

This print statement is debug code and should not be present in production. It should be removed or replaced with a proper logging call.

message_id=message_id,
error_msg=error_msg,
top_documents=top_documents,
cited_documents={},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 902-905

        cited_documents={},
        message_id=0,
        error_msg=None,
        top_documents=[],

These hardcoded default values indicate a significant loss of functionality. The gather_stream function no longer extracts citations, message ID, error messages, or top documents from the stream, which were previously handled. This is a critical functional regression.

@@ -0,0 +1 @@
# Turn module for chat functionality

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 1-1

# Turn module for chat functionality

This comment could be replaced by a package docstring for better documentation practices and accessibility. While not strictly problematic, it's a missed opportunity for more robust documentation.

@@ -0,0 +1 @@
# Infrastructure module for chat turn orchestration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 1-1

# Infrastructure module for chat turn orchestration

Using a # comment for module-level documentation is less standard than using a module docstring. Docstrings are accessible via help() and are used by documentation generation tools, improving maintainability and discoverability.

)
elif packet_type == "response.output_item.done":
return SectionEnd(type="section_end")
packet_class(**filtered_data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 157-158

            packet_class(**filtered_data)
            return packet_class(**filtered_data)

The line packet_class(**filtered_data) creates an instance of packet_class but immediately discards it, as the very next line creates another identical instance which is then returned. This is redundant and inefficient.

emitter = Emitter(bus)
current_context = contextvars.copy_context()
dependencies.emitter = emitter
t = threading.Thread(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 39-56

        t = threading.Thread(
            target=current_context.run,
            args=(
                turn_func,
                messages,
                dependencies,
            ),
            daemon=True,
        )
        t.start()
        while True:
            pkt: Packet = emitter.bus.get()
            print("packet", pkt)
            if pkt.obj == OverallStop(type="stop"):
                yield pkt
                break
            else:
                yield pkt

This code block contains several issues: a debugging print statement, a potential bug in the OverallStop comparison that could lead to an infinite loop, and a critical reliability flaw where unhandled exceptions in the turn_func thread will cause the emitter.bus.get() call to block indefinitely, hanging the generator. Additionally, the while loop's if/else structure can be simplified for better readability.


def default_packet_translation(ev: object) -> PacketObj | None:
if isinstance(ev, RawResponsesStreamEvent):
# TODO: might need some variation here for different types of models

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 11-11

# TODO: might need some variation here for different types of models

The TODO comment indicates that the current implementation is incomplete or might require future adjustments for different model types. This suggests a known limitation or an area that needs further development, impacting the long-term maintainability and completeness of the feature.

# OpenAI packet translator
obj: PacketObj | None = None
if ev.data.type == "response.created":
obj = MessageStart(type="message_start", content="", final_documents=None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 15-15

obj = MessageStart(type="message_start", content="", final_documents=None)

The content and final_documents fields are hardcoded to an empty string and None respectively for response.created events. While this might be the desired initial state, it assumes that response.created events never carry initial content or document references. If future RawResponsesStreamEvent data for response.created includes such information, this hardcoding would lead to data loss or incorrect message initialization.

# TODO: might need some variation here for different types of models
# OpenAI packet translator
obj: PacketObj | None = None
if ev.data.type == "response.created":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 14-19

if ev.data.type == "response.created":
    obj = MessageStart(type="message_start", content="", final_documents=None)
elif ev.data.type == "response.output_text.delta":
    obj = MessageDelta(type="message_delta", content=ev.data.delta)
elif ev.data.type == "response.output_item.done":
    obj = SectionEnd(type="section_end")

The function only handles three specific ev.data.type values. Any other RawResponsesStreamEvent type will cause obj to remain None (initialized on line 13) and the function will silently return None. This lack of explicit handling for unknown or future event types can lead to unexpected behavior or silent failures downstream without any indication of what was missed.

data: list[dict[str, dict[str, str]]] | None = None,
remote_dataset_name: str | None = None,
provider: EvalProvider = get_default_provider(),
no_send_logs: bool = False,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code Issue

Lines: 103-103

no_send_logs: bool = False,

Using negative phrasing for boolean parameters like no_send_logs can sometimes lead to confusion, especially when combined with negation in conditional logic. A positive name like send_logs would be more intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants